UL 1A mme 


‘DATIOHGL SEEDBRITT OGENEY 
poov GROGEE 6. WEGHE, HAGTLADO 


W/ € ical 


SLOTENEE 115) 2/6 












(e) TS ee er 


SEMANTIC SVOIDS: DON'T SHOOT THE TRANSLATOR. | ree en MOE 
Al FOR THE LINGUIST...... A. J. Salemme....... 


DEB OSE EEC S Ba WN er Classified by DIRNSA/CHC8S 
Exempt from 
7OP-SIGni= tication by the Origitaer 


Declassified and Approved for Release by NSA on 04-13-2021 pursuant to E.O. 13526, MDR Case #109435 


NT 












ERUOVELOL 


Published Monthly by Pl, Techniques and Standards, 


for the Personnel of Operations 





VOL. III, No. 9 SEPTEMBER 1976 














The following article in C-LINERS 
(C Group Machine Processing Infoamation 
Bulletin), Vol. 3, No. 9, Issue No. 30, 
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As was the case with Mark Twain, reports 
you might have heard about the demise of T1PS 
~- NSA's Technical Information Processing System 
-- have undoubtedly been grossly exaggerated. 


True, TIPS is showing signs of aging. After 
all, she has been around since the mid-Sixties. 
To some, such longevity should qualify her for 
some kind of geriatric support. To others, 
notably some folks in Cll (the Information Sys- 
tems Division of C), the old girl is still very 
much alive. Admittedly, a young and more glam- 
orous replacement is being sought. Nobody knows 
when this rival will be embodied (or "em-machined") 
but she is coming and plans are being made for 
her arrival. 


At this stage in her career, then, one feels 
it would be a good time to record a few random 
thoughts about TIPS. A few words of background 
information may be in order for those not in 
the category of C Old-timers. First, a more or 
less official definition: 










TIPS is a part of RYE, a UNIVAC 494 
remote-access, machine processing sys- 
tem at NSA. The TIPS system encompasses 
the hardware devices, software executive 
routines, conventions, communications 
package, and data bases in support of 
the quick-turnaround, on-line, informa~ 
tion storage and retrieval capability 
within RYE. (See Section 4, of forth- 
coming USSID 703, Technical Information 
Processing System (TIPS), for general 
information about this system.) 


Under the TIPS concept the data files are 
stored on CDC-606 tape drives. They are con- 
nected to Honeywell DDP-516 mini-computers, 
which execute the queries submitted by the vari- 
ous users. A query is a series of simple state- 
ments written in TILE (TIPS Interrogation Lan- 
guage). Normally, it might consist of just a 
retrieval command and a display statement. 
Queries are entered remotely from a teletype 
terminal, or some other peripheral device, like 
a Raytheon CRT or a Bostic paper-tape reader. 


Chances are that, as a RYE user, you've 
already "interfaced" with the most common of 
these input devices, the lowly teletype. The 
manufacturer is the Teletype Corporation of 
America, and the most common terminal type (for 
RYE) is the ASR (Automatic Send and Receive} 
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Model 35, or simply "Mod 35" to the old RYE 
hands. (Incidentally, you might well impress 
some of the data systems newcomers at the Agency 


by reminding them that all teletypes are tele- Data Handling System), which links a number of 
printers but not all teleprinters are teletypes | remote customers indirectly to TIPS, the NSA 
-- especially if you've heard them talking about | System. These include: 

"xeroxing" their output, with a small "x"!) 


and-forward" switch, housed in a PDP-11 main 
frame and located at the DIA. In turn, COINS 
interfaces with the so-called [DHS (Intelligence 


e Air Force Intelligence (AFIN), Pentagon; 
"ASR" means you can input a paper tape in « Naval Ocean Systems Center (NOSIC), 
the reader while at the same time receiving Suitland, Maryland; and 
page-print at 72 characters a line. The read- 


P . ra se e FICEURLANT, Norfolk, Virginia. 

ng speed certainly isn't the fastest by today's f 
standards -- 10 characters per second, or about They also include such very remote customers as: 
100 WPM. By contrast, the Bostic units (each 


e Air Defense Command (ADC), Colorado Springs; 
controlled by an ASR-3S terminal) can input at « European Command (USEUCOM), Vaihingen, 
a rate of about 300 characters per second and Germany; and ‘ 4 
punch paper tape at a speed of 105 characters e organizations subordinate to the Pacific 
per second, Command (PACOM), such as: 
e PACAF, Hickam AFB, Hawaii; 
The Mod-35s are flexible, however, and easy ie CINCPAC/IPAC HQ, Camp Smith, Hawaii; 
to operate. There are nearly 100 of them in 
2 = » COMUSKOREA, Yongsan, Korea. 
operational NSA spaces (including FANX) hooked = 
into RYE. This figure doesn't include the Now active on the TIPS system are about 40 
units belonging to C system organizations. separate SIGINT files, not including about 15 


Note also that it doesn't include the consider- |SUPport files for accounting, user-aid, and test 
able number of Mod-35s connected to TIDE (Time |PUrPoses. They are managed by owners spread 
Dependent System) housed in a similar U-494 £ 
main frame. The breakdown for ownership of 
the RYE-connected units is as follows: 





Some of these 40 TIPS files are on the 
- 24 G - 13 "technical" side. Admittedly, they are designed 
Bo 2 Bos 2 more for the SIGINT producer than 
cl - 2 Ro - v4 
% cS - 2 voeoe- 8 
., E - 10 Wo- 15 


The ahove list doesn't include the non-NSA, 
service intelligence organizations, i.e. 
AFINAR (Air Force Intelligence and Research), 
USASRD (U.S.A. Spécial Research Detachment), 
and NFOIO (Naval Fielé& Qperations Intelligence 
Office), each of which hds.a terminal and can 
access selected TIPS/COINS fiigs. 





SIGINT users, like | 

cee the USAFSS at Kelly AFB, Texas, 
lave terminals connected to TIPS. They operate 
like regular TIPS customers, i.e., they are 
linked directly to the RYE master machine, and 
from there to the TIPS data bases to which 

they have been given access. 


The second major category of TIPS files 
could be termed the “intelligence” type. These 
are files available to both NSA users and the 
intelligence community through COINS. One big 
subset of the "intelligence" fil i 





» Incidentally, if you have come aboard the 
Agency fairly recently, you may be confused 
about the relationship of TIPS to COINS. TIPS 
is the NSA mode of the Community On-Line Intel- 
ligence System (COINS) -- a network of intelli- 
gence-community computers which have been in 
place since 1967 in either pilot-experimental 
or ‘final - -operational mode. COINS currently 
intérfaces with the NSA system through a "store- 


‘The TIDE file carries the latest 20 days of 
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intercept; the TIPS files, about 90 days of 
data, 21 or more days old. 


, «& veksion of SIRE, the SIGINT Requirements 

* “file managed by V1, is off-loaded to TIPS for 
the benefit primarily of remote SIGINT producers 
and collectors. (SIRE is actually maintained 

« . through the SOLIS system.) 








At this writing, the volume 0 
TIPS system totals about 300 million characters. 
This volume is more tnan we once thought could 
be accommodated. If you are a potential owner 
of a new TIPS application, this shouldn't neces- 
sarily discourage you, unless you are thinking 
of submitting a large file for consideration. 

A number of current files are scheduled to be , 
removed from the system. This should free some 
space for future applications. Call C12 for,* 
more information about available space. ° 


To many potential TIPS/COQINS users at Sa, 
a perennial problem has been, "Where can el find 
a convenient RYE outstation from which fo input 
my queries?" Terminals are apt to be docated 
mostly in machine rooms that may or mhy 
not be handy to your office. Admittedly, this 
is far from the ideal situation. Part of the 
problem is that Mod-35s tend to bg a little 
noisy and not conducive to a serene office at- 
mosphere. Perhaps one solution* would be a 
telephone call to your friendly machine support 
unit (e.g., G8 for G files, or B42 for B appli- 
cations), which could make fhe query for you. 
In any event, C113 (5203s), which functions as 
a TIPS customer support uhit, in addition to 
staffing the NSA COINS Subsystem Manager's of- 
fice, can guide you toward the nearest out- 
station, or make a qugry for you. 


. 


. 
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Another recurring problem has to do with 
the terminal equipment itself. Like al] of us, 
the Model-35s are aging; at the same time they 
are occasionally a prey to gremlins and mysteri- 
ous poltergeists. Maintenance service, however, 
is normally prompt. Call RYE operations, 5935s, 
to enter a complaint and arrange for servicing. 


Do you need a beginning or refresher course 
in TILE (TIPS Interrogation Language)? Formal 
training is offered aperiodically by the 
National Cryptologic School (MP-175). If you 
are interested in such a class, call E21] (8555s) 
or Cll. The COINS Project Management Office 
does a good deal of informal training in TILE, 
as well guages used to access COINS 
files at DIA, through its user support 
gyoup.* Tou can Teach them at 1108s. C113 can 
‘also arrange informal training/briefings for 
potential users, or analysts who would like to 
have a refresher in any aspects of TILE. 


At this writing the long-term future of TIPS 
is hazy. We are still in the era of TIPS I. 
Will there be a TIPS II? No one is saying, and 
presumably a final decision has not been made. 
At the moment one assumes that the TIPS I1 ma- 
chine, or whatever system replaces her, will be 
strongly interactive -- with all the features, 
and high overhead, that this capability entails. 
But one hopes, however, that the time-honored 
batch mode will not be scrapped completely. A 
lot of us old-timers (and some younger ones who 
have grappled with day-by-day general processing 
at NSA) like to point out a simple but note- 
worthy fact: the remote-processing, batch-mode 
systems of yesterday are still here and still 
performing prodigies of labor for the Agency. 
One of these is SPECOL (Special Customer Oriented 





y nformation Storage 
and Retrieval) and data-processing system for 
several different machines, including the 360/370 


world. RYE/TIPS is another workhorse, limited 
to a UNIVAC main frame but reaching out as far 
as Europe and the Pacific to perform its IS&R 
and data-processing role. Note that both SPECOL 
and TIPS are more than ISGR systems. They are 
data-processors as well as retrievers, able to 
extract, sort, compute, format, and output in- 
formation in many different ways, and for many 
different kinds of users. Can you say the same 
for your interactive system? 


These random thoughts have not been aimed at 
disparaging the young DBM systems we are now 
ogling. Assuredly, they have a bright future at 
NSA, although their outlines are a bit murky 
yet. But let's not be in too big a hurry to 
divorce ourselves from the old batch-mode sys- 
tems that have served us so well for a long 
time. They deserve at least a glass or two 
raised in tribute. 


a ee 
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Cecil Phillips' article "Musings About the 
AG-22/IATS" in issue No. 28 of C-LINERS [and 
reprinted in CRYPTOLOG, March 1976] caught my 
attention because of the recent work I have 
done on data bases built from AG-22/IATS (and, 
not too long ago, FF STRUM). I do not think 
that anyone would argue that computer records 
built totally automatically from AG-22 and 
IATS are of low quality because of poor plan- 
ing or a lack of sophisticated programming. 
The quality of the copy itself is obviously 
to blame and I do agree with Mr. Phillips that 
the place to solve the problemis at the source; 
however, I disagree that more operator tags 
and less traffic is the solution. 


I cannot recall any change or addendum to 
Morse copy instruction, within the last 20 
years, aimed at lessening the amount of non- 
intercept data an operator must manually enter 
into his log (traffic service). In fact, the 
trend has been in the opposite direction. The 
implementation of AG-22 brought tagging; dim- 
inishing intercept resources demanded that we 
narrow our requirements and specifically define 
the collective objectives, and now COPES is 
with us. The Morse operator is, and always 
has been, overburdened and it seems to be 
some accepted rule of collection strategy that 
a position must be assigned more cases than it 
is humanly possible to cover. There is some- 
thing drastically wrong with this rationale, 


Most of our Morse operators from the Crypto- 
logic Services are placed on the job (on OJT) 
just out of school and complete their military 
enlistments with less than 24 months on posi- 
tion. There is absolutely no comparison be- 
tween U.S. copy {from a quality standpoint) and 
that of second- and third-party sources where 
genuine professionals are universally employed. 
I'm not trying to put down our operators; on 
the contrary, I think they do an exceptional 
job with the limited amount of experience they 
have on the average, but leaving important line- 
by-line copy decisions (i.e., summarizations) 
up to them, as Mr. Phillips has suggested, is 
not at all practical. Weak reception, high- 
speed transmissions, signals buried in QRM/N, 
coverage of both ends, etc., require intense 
concentration during copy by the operator. 
tercept of transmissions under such adverse 
conditions is common and would hardly lend it- 
self to a form of selective abbreviated copy. 
Gisting that which is the usual also requires 
that the experienced traffic analyst (or crypt- 


In- 


Reprinted from C-LINERS, 
VoL. 3, No. 9, Spring 1976, 





Tasue No. 30 


analyst) do it, especially if enciphered pro- 
cedures are used. 


There are a number of important problems in 
this Agency today where "non-message" data is 
all that is "readable." Messages are either 
high-grade (unsolved) or practice, and hold 
little or no analytic value. Copy instructions 
for certain groups passing only practice some- 
times require only the first line or so since 
we anticipate these texts will be repetitions 
from recovered pages. The chatter, whether the 
messages are of value or not, may contain 
routing instructions, frequency and schedule 
references, authentication, message precedence, 
etc., each of which is essential in determining 
communications procedure, identifications, and, 
most important, network continuity. We have 
even had callsigns, selected in a particular 
order from a recovered page, used as time indi- 
cators in missile-launch countdowns. Order of 
callup has also been used to establish continu- 
ities. International Q and Z codes have been 
used for purposes other than originally intended. 
There are many examples of Q's and Z's being 
used to indicate precedence and authentication 
or other uses peculiar to a target's need. Ab- 
breviated plaintext chatter, prevalent on many 
problems, also requires special attention. Full 
copy of chatter, less the strings of V's, re- 
peated calls, etc., is important to any analysis 
to be performed on the material. 


Analysis of the collected data fluctuates ac- 
cording to need. When continuities are good and 
callsigns projectable, less emphasis is placed 
on the study of message externals. There are 
many traffic analysts here today who have never 
worked on a major target complex where the call- 
sign system in use was unknown. Traditional TA 
for some of them is an unknown art that will be 
painfully rediscovered when these callsign sys- 
tems change. We have enough trouble with rou- 
tine changes (within a recovery system), so 
let's not lose the remaining means for reestab- 
lishing the continuities when callsigns can no 
longer be used as the primary lead to an identi- 
fication or continuity. There is a great deal 
more to TA than callsign lookup, and it lies in 
the detailed study of all communications data, 
especially in the chatter. We should not omit 
or attempt to summarize this kind of information 
at the point of copy. 


Extra intermediate edit steps to fix the 
traffic copy are no solution either. Manual 


September 76 * CRYPTOLOG * Page 12 


=n 


papi DBalinlidaie ASA Uae biekttbl tiie adiinieas 


eRe 


intervention to correct an automated process 
seems to me to be a step in the wrong direction. 
We will always need people to prepare the input 
(operators) and others to study the output (ana- 
lysts). Mechanization should join these two 
functions; if we need to fix the system, let's 
do it on either or both these ends, not in the 
middle. Middlemen editing and/or correcting’ 
large volumes of traffic with strict deadlines 
to be kept contibute very little to the process, 
since we are frequently forced to accept anyone 
from any specialty who happens to be available 
to do this job. Re-identification (except on 
search material), as part of a maintenance pro- 
cess performed back here, perpetuates, rather 
than corrects, a bad practice. When inter- 
cept identifications must be changed, this means 
that the wrong target was copied as mission. It 
is our responsibility to see to it that the op- 
erator gets the right information to acquire and 
identify his target properly in the first place. 
After-the-fact case corrections are useless to 
him if the SOIs, i.e., callsign periods, are 
short in duration. Any other type of correction 
alters the original copy, which should not be 
done for a multitude of reasons. File main- 
tenance, if it's worth doing, also requires 
quality control, which is turn requires time 
and resources. 


The value of manually correcting traffic 
for data-base storage is questionable, since 
the case analyst is usually through with it at 
the close of the work day whether it is properly 
formatted or identified anyway. He has logged 
the important items from his material, updated 
his net diagrams, etc., and will probably never 
need the traffic again for the types of specific 
work he is required to do (e.g., TEXTA, in one 
of its many forms, which is, in turn, machined}. 
I think it is best that we continue to get the 
best possible operator copy and retain all of it 
and the identifications (his, any intermediate 
machine re-idents, and the final) that are 
placed on the stored copy. Under no circum- 
stances should we change the original version of 
copy or summarize any part of the data that 
does not follow a predictable pattern, such as 
valid messages and chatter, even if the instruc- 
tions (SCOLs, etc.) require no more traffic than 
is necessary for identification. If we need to 
de-dupe, let the programming handle it. 


So, this brings about another question: If 
the case analyst doesn't need a traffic data 
base in the normal conduct of his daily duties, 
who does? Management sometimes, for collection 
studies from data not available anywhere else, 
but this certainly does not justify a data base. 
The real user is the research analyst studying 
larger masses of both identified and unidentified 
traffic. Here is where new ideas and analytic 
concepts are born. It is for this reason, and 
perhaps this reason alone, that we maintain a 
number of these data bases. I think we are 
justified in doing this for just this purpose, 


but I would be hard put to support my conclusion. 

I suspect that I would have no case at all if If 
had to provide supporting evidence based on past 
usage of the data bases or the applications of 
special research, beyond case analysis, which 
is often sacrified in favor of continuing proj- 
ects and the fulfiliment of day-to-day commit- 
ments. 

The potential uses of these data bases 
through readily available programs should bolster 
the TA imagination and create new approaches if 
properly publicized and if adequate indoctrina- 
tion is provided, at least enough to get us 
beyond the customary case and data retrieval 
(with follow-on sort and list) that is usually 
requested by the average analyst. In spite of 
the more obvious shortcomings of formatting and 
processing traffic for these data bases, I feel 
they are valuable whether corrected or not and 
that we should promote the use of the numerous 
facilities and software that we have at hand in 
exploiting them. Getting this done is another 
gigantic problem in itself that needs the at- 
tention of management in particular. 


The problem of formatting traffic for data- 
base storage is an old one with quite a history. 
I can remember my efforts (some 15 years or so 
ago) to have field analysts edit traffic for 
electrical transmission rather than prepare the 
detailed (complicated) TECSUMs/MATSUMs of the 
day. The idea didn't make it, but the same 
general concept of editing for data-base for- 
matting I tried to sell is now the responsibility 
of the AG-22 operator, called tagging. 


The formatting of Morse traffic has been 
tied to a number of vehicles over a long period 
of time and evolution, from TECSUMs to MATSUMs 
to STRUM to AG-22. All of these have had a 
measure of success, but now that we have managed 
to automate forwarding directly from the inter- 
cept position, our concentration should still be 
on intercept and the improvement of copy. 1 
believe that placing the formatting responsibi- 
lity almost totally on the operator through 
tagging will divert our collection from these 
objectives and generally degrade traffic quality. 
We should be able to do without some of it or 
substitute multifunctional and/or automatic 
tags and develop programs, with the objective 
of doing a better job of data-base formatting, 
that are a little less operator-dependent. 


Surely we have the expertise on hand to do 
this. We have a pretty good system working for 
us now, not just in AG-22/IATS, but in a variety 
of other areas, e. g., automated callsign pro- 
jections, machine decryptions, radio wave prop- 
agation, cryptanalytic diagnostics, etc. If we 
can "teach" the machines to do these things, 
there is no reason why we can't get on with the 
task of formatting traffic acceptably with 
follow-on programs doing the specialized work. 
I do know that edit programs have been written 
to "fix" message tests from existing data bases, 
not only for decryption, but also for indexing, 
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diagnostics, etc., as well, and the cryptanalyst| this material through identifications (case and 


has been generally satisfied with the product. terminal) for the specialized processing that 
There have been programs developed that needs to be done, that is sufficient. 

scan chatter, find enciphered address groups, i: <: 

decrypt them, and place them in sort field (key) Successful traffic analysis depends upon 
locations so they can be listed in an orderly traffic quality. Let's do something that will 
fashion and in context. We should be able to assist the operator in making that product more 


expand on these techniques and do a fairly good | 2¢curate, complete, and at the same time make’ 
job of mechanizing the editing and formatting of his job easier. It's time the machines were put 
AG-22/IATS "take" with minimal tagging. I con- | to work to serve us, 

tend that if we retain selective retrievability Sinn 
in these data bases so that we can get back to 
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The COMINT linguist has at his disposal, in 
addition to general and specialized dictionaries, 
various types of language working aids that ar- 
range individual words or connected text in 
ways that look unusual to the linguist!. The 
purpose of this article is to acquaint the non- 
linguist, and the linguist with no previous 
COMINT experience, with the COMINT need for those 
working aids and with some of their characteris- 
tic uses and limitations. 


But before discussing these aids, it might be 
worthwhile to explain why Language aids have been 
created to manipulate words in ways that seem 
to differ so greatly from the normal patterns 
of spoken and written language. Unlike Moliere's 
bourgeois gentleman, who was surprised to learn 
that he had been speaking prose all his life, 
there are probably few who are surprised to 
learn that normal spoken and written language 
isunidirectional. That is, people start speak- 
ing and, whether or not they have previously 
organized their thoughts logically, they pro- 
duce their words one after the other in a defi- 
nite, irreversible time sequence. Or they start 
writing and, depending upon the particular lan- 
guage, put the words down ina definite sequence 
from left to right, right to left, top to bot- 
tom, etc, This text, for example, has the words 
arranged the way we like them -- and it's not 
just because a typewriter couldn't be hooked up 
to type the words boustrophedonically. 


The following article is a slightly revised 
and updated version of the article "Machine- 
Produced Language Aids," which appeared in the 
ae recternnae Journal, VoL. XV, No. 3, Summer 


The listener or reader perceives each string 
of words in its produced sequence (and then is 
able, of course, to scramble them at will inhis 
mind). He does not usually listen or read by 
jumbling the words back and forth out of time or 
space sequence. Nor does he usually listen or 
read “backwards.” Spoken or written sequences 
that are actually intended to be interpreted in 
a “backward” sense are usually contrived for 
artistic or comic effect. Such contrived se- 
quences include palindromes ("Able was I ere I 
saw Elba"), typographic tricks (caption on a 
cartoon showing two people shouting to one an- 
other as their respective cable cars pass: 

“I said wait for me at the other end” and 
“bne tedto eff tp 9m 103 tinw bine I”) 

or dialogue in animated cartoons (the hero, 
propelled at breakneak speed over a hazardous 
course, stops dead to exclaim, "What a buggy 
ride!", continues on his way until he crashes, 
and then is propelled backwards over the same 
course, during which he stops to exclaim, 
“Ride buggy a what!"). 


Except when creating or responding to such 
stylistic tricks, normal people (that is, those - 
not suffering from some psychological or physio- 
logical impairment of the ability to produce or 
to perceive speech or writing) handle language 
strings in their usual sequence. But COMINT lan- 
guage specialists do not deal with language data 
that is "normal." In the everyday world a per- 
son on the end of a telephone line who does not 
hear a word perfectly can ask the person at the 
other end to repeat it. The NSA transcriber ob- 
viously cannot do so. In the everyday world an 
interpreter, seeing a confused expression on 
the face of the person for whom he is interpret- 
ing, can clarify the linguistic ambiguity right 
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then and there. The NSay Linguists has to deal 
with the ambiguities as “they «stand {"He said 
June, but I think he mtant’ puly*),. In the 


ant telegram with a critical Word garbheg can 
request a repeat. The NSA analyst can onty. | 
hope that the intended recipient wild be just as’ 
confused as he is and will request a repeat 
that is also made availdble to the NSA analyst. 
If he does not, the NSA linguist has to degarble 
the text as best he can,eeven if that involves 
his going out on a limb.* + 


NSA linguists, accustamed to dealing with + 


tening in on foreigners' eonversations and 
reading their foreign-language telegrams, know 
that often they cannot attatk a foreign-language 
text in strict left-to- right order. Nonlin- 
guists sometimes have the impression that, just 
like a kid who can't have Itis dessert until he 
has finished his carrots ang string beans, the 
NSA linguist has some professional or moral ob- 
ligation not to look at the»second word until 
he has translated the first one, or to look at 
the third until he has translated the second, 
etc. But this is not true eYen in what might 
be called normal translation work, when, for 
example, a commercial (that is, non-COMINT) 
technical translator is translating a completely 
ungarbled text from a printed? open-source book 
or magazine -- when translating a technically 
complex or grammatically obscure sentence, he 
may indeed have to look ahead (to the next sen- 
tence, to the next page, or evén to the next 
chapter) for clues that will rdsolve the ambigu- 
ities. And the situation is even more complex 
for the NSA linguist, who often.finds that, be- 
cause of message encryption, garbling, poor 
audio signal, etc, itis impossible to attack 
a written text in a strict left-to- sright sequence 2 
* or to transcribe a radiotelephone conversation 
‘ from the first syllable to the Jase on the tape. 


. 
. 
. 
. 
. 
. 
« 
. 
« 
. 
. 
« 
. 
« 
. 
. 
« 
* 
. 
. 
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. 
. 
. 
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. 
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The NSA linguist is a kind of Sclenbists in the 
+ sense that he can isolate the words he wants to 


* examine and can subject them to any Kind of test 
3 he wants, 





2See: "Right-to-Left Text Sorts Are Not 
Impossible," by CRYPTOLOG, 
August 1974. 
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everyday world a persoh who receivés an import- 


these and related problems that arise when lis-, 


* states with identical firmess, "The letter m 


The NSA linguist's job is analogous to that. 
of the FBI specialist in the chemical-analysis * 
or ballistics laboratory. But whereas the test's 
run by the FBI specialist weigh and measure the: 
quantitative properties of physical objects, * 
‘those run by the NSA linguist assess the quali-. 
tative properties of words. And words are "things"* 
that keep changing. 1f an NSA linguist has a 
garbled S-létter word, how does he decide which; 
of the three obviops degarbles is the most likely , 
to have been the word intended in the original « 
text? If an NSA bookbreaker has an unrecovered: 
value in a one-part code between letter M and 
*Q, how does he decide which are the best guesses? _ 


If. earch analyst has a tk signed 
how does he decide vine ey 


| . 


by 


it is? Or if he has, in his traffic, an abbre-> 
viation with 40 possible expansions, how does 
he determine which is the one that the message 
originator had in mind and that the message re- 
cipient will recognize immediately? (Inciden- 
tally, how, in dealing with intercept text writ- 
ten all in capital letters, did the NSA linguist 
recognize the abbreviation in the first place?) * 
The answer to all these questions is the same: 
he made a judgment based on his thorough know- 
ledge of the language with which he is dealing 
(not just a thorough "book-l’arnin'" knowledge 
of the language, but a thorough knowledge of it 
as it is actually used by the specific user, 
with all his specific educational, occupational, 
social, and other peculiarities), and based on 
his submitting of the language data to validity 
tests at all stages of intercept, decryption, 
translation, and interpretation. These tests 
that the NSA linguist subjects his language 
data to are just as valid as the chemical and 
spectrographic tests conducted by physical sci- 
entists, but the results of the linguistic anal- 
ysis do not have a nice, solid scientific look 
to them. A typewriter expert can state in 
court, "The letter m in the examined document 
could not have been made by a Smith-Corona type- 
writer" and stand ready to back up his statement 
with enlarged photographs showing measurable 
distinctive features. But the NSA linguist who 


im this word must be a garble for 2" usually 
cannot support his findings as impressively, and 
might even sound downright shifty as he brings 
in such qualifiers as “usually cannot," or re- 
fers --*howover modestly -- to his "years of 
experience,'' "feel for the language," or to 
"letter-fréquency probabilities," “contextual 


incongruity,'* and other qualitative, rather than 
quantitative, proofs. 


How, then, does the NSA linguist equip 
himself when attacRing COMINT text in a particu- 
lar foreign language? Obviously, the first 
Step ds to acquire fewer and newer dictionaries 
and to*augment them witir operational language 
files and specialized glossaries. But these 
all list words’ in, the normal,alphabetic order 



























and are therefore of iimited value to the crypt- 
analyst or linguist trying to cope with a word with 
its beginning missing, garbled, or cryptographi- 
cally unrecovered. Therefore, NSA analysts spe~- 
cializing in language ‘have felt the need to pre- 
pare various types of Tanguage aids that list 
words in other than noriyal dictionary order?. 


The aids fall generally into three basic 
categories: eae” 

= word-pattern listings, 

« backward listings, * 

= window indexes. . a 

s . 

Word-Pattern Listings * i 

Word-pattern listings are*listings usually 
of preselected words (Gonet ites including word 
phrases of two or three words} that are typical 
of a particular type of context and that have 
a particular pattern of letter*repetition. For 
example, the English words AARDVARK, EEL, LLAMA, 
CANNOT, and WILL each dontain a doubled-letter 
sequence that can be represented, by the coding 
AA. The words LLAMA, COMPLETE, and MILITARY 
each contain a repeated letter witth one inter- 
vening letter, which pattern can be represented 
by the coding A-A. The’ phrases CAN NOW and TO 
OPEN each contain a doubled-letter+sequence that 
can be represented either by the coding AA (if 
the space between words-is nonsignificant) or 
A-A (if the space betweén words is significant). 
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